Supply Chain Analysis¶

By: Nanda Akhirdianto

Background¶

Supply chain analysis is the process of collecting, analyzing, and interpreting data about a company's supply chain in order to improve its efficiency and effectiveness. This can involve a variety of data, such as product types, SKUs, prices, availability, number of products sold, revenue generated, customer demographics, stock levels, lead times, order quantities, shipping times, shipping carriers, shipping costs, supplier names, locations, lead times, production volumes, manufacturing lead time, manufacturing costs, inspection results, defect rates, transportation modes, routes, and costs.

Purpose of the Project¶

The purpose of the project data is to use it to perform supply chain analysis and identify opportunities for improvement. The data can be used to answer a variety of questions, such as:

  1. What products are most popular?
  2. What are the most profitable products?
  3. What are our stock levels?
  4. What are our lead times?
  5. How much are we spending on shipping?
  6. What are our supplier costs?
  7. What are our manufacturing costs?
  8. What are our defect rates?
  9. How can we improve our transportation?

The answers to these questions can help us make better decisions about our supply chain, such as:

  1. Which products should we focus on?
  2. Where should we open new warehouses?
  3. Which suppliers should we work with?
  4. How can we improve our manufacturing process?
  5. How can we reduce our shipping costs?
  6. How can we improve our customer service?

Dataset Information¶

We got the dataset that we used here, and this project was inspired by the data analysis project by Aman Kharwal. The dataset information is as follows:

  • Product Type: The type of product, such as clothing, electronics, or food.
  • SKU: The unique identifier for a product.
  • Price: The price of the product.
  • Availability: Whether the product is in stock or out of stock.
  • Number of products sold: The number of products that have been sold in a given period of time.
  • Revenue generated: The total amount of money that has been generated from the sale of products in a given period of time.
  • Customer demographics: The demographic information about customers who have purchased products, such as age, gender, location, and income.
  • Stock levels: The number of products that are currently in stock.
  • Lead times: The amount of time it takes to receive a product from a supplier.
  • Order quantities: The number of products that are typically ordered in a single order.
  • Shipping times: The amount of time it takes to ship a product to a customer.
  • Shipping carriers: The shipping carriers that are used to ship products to customers.
  • Shipping costs: The costs associated with shipping products to customers.
  • Supplier name: The name of the supplier who provides a product.
  • Location: The location of the supplier.
  • Lead time: The amount of time it takes for a supplier to produce a product.
  • Production volumes: The number of products that a supplier can produce in a given period of time.
  • Manufacturing lead time: The amount of time it takes for a supplier to manufacture a product.
  • Manufacturing costs: The costs associated with manufacturing a product.
  • Inspection results: The results of inspections that are performed on products before they are shipped to customers.
  • Defect rates: The percentage of products that are defective.
  • Transportation modes: The modes of transportation that are used to ship products to customers.
  • Routes: The routes that are used to ship products to customers.
  • Costs: The costs associated with transporting products to customers.

This dataset provides a comprehensive overview of a company's supply chain. By analyzing this data, businesses can identify opportunities to improve their efficiency and effectiveness, and ultimately, their bottom line.

Role and Responsibility¶

My role in this project is as a Data Analyst, who is responsible for carrying out the following tasks, including data interpretation, data transformation, data cleaning, and conducting descriptive analysis and visualization to get the insights needed.

Setup¶

In this first process, we need to define all the tools to execute this project. We are using Jupyter Notebook that has built in Python as well as functional libraries to work with data, such as Pandas, NumPy, Plotly, Matplotlib and more, and also it super easy to use. We can definitely see the result once we run our script on each cell.

In [1]:
# Import all necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.io as pio
import plotly.graph_objects as go
pio.templates.default = "plotly_white"

%matplotlib notebook
In [2]:
# Load data
data = pd.read_csv('Downloads/supply_chain_data.csv')
In [3]:
data.head()
Out[3]:
Product type SKU Price Availability Number of products sold Revenue generated Customer demographics Stock levels Lead times Order quantities ... Location Lead time Production volumes Manufacturing lead time Manufacturing costs Inspection results Defect rates Transportation modes Routes Costs
0 haircare SKU0 69.808006 55 802 8661.996792 Non-binary 58 7 96 ... Mumbai 29 215 29 46.279879 Pending 0.226410 Road Route B 187.752075
1 skincare SKU1 14.843523 95 736 7460.900065 Female 53 30 37 ... Mumbai 23 517 30 33.616769 Pending 4.854068 Road Route B 503.065579
2 haircare SKU2 11.319683 34 8 9577.749626 Unknown 1 10 88 ... Mumbai 12 971 27 30.688019 Pending 4.580593 Air Route C 141.920282
3 skincare SKU3 61.163343 68 83 7766.836426 Non-binary 23 13 59 ... Kolkata 24 937 18 35.624741 Fail 4.746649 Rail Route A 254.776159
4 skincare SKU4 4.805496 26 871 2686.505152 Non-binary 5 3 56 ... Delhi 5 414 3 92.065161 Fail 3.145580 Air Route A 923.440632

5 rows × 24 columns

In [4]:
data.tail()
Out[4]:
Product type SKU Price Availability Number of products sold Revenue generated Customer demographics Stock levels Lead times Order quantities ... Location Lead time Production volumes Manufacturing lead time Manufacturing costs Inspection results Defect rates Transportation modes Routes Costs
95 haircare SKU95 77.903927 65 672 7386.363944 Unknown 15 14 26 ... Mumbai 18 450 26 58.890686 Pending 1.210882 Air Route A 778.864241
96 cosmetics SKU96 24.423131 29 324 7698.424766 Non-binary 67 2 32 ... Mumbai 28 648 28 17.803756 Pending 3.872048 Road Route A 188.742141
97 haircare SKU97 3.526111 56 62 4370.916580 Male 46 19 4 ... Mumbai 10 535 13 65.765156 Fail 3.376238 Road Route A 540.132423
98 skincare SKU98 19.754605 43 913 8525.952560 Female 53 1 27 ... Chennai 28 581 9 5.604691 Pending 2.908122 Rail Route A 882.198864
99 haircare SKU99 68.517833 17 627 9185.185829 Unknown 55 8 59 ... Chennai 29 921 2 38.072899 Fail 0.346027 Rail Route B 210.743009

5 rows × 24 columns

In [5]:
# Get slight info from our data
print(data.info())
print(data.dtypes)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 24 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Product type             100 non-null    object 
 1   SKU                      100 non-null    object 
 2   Price                    100 non-null    float64
 3   Availability             100 non-null    int64  
 4   Number of products sold  100 non-null    int64  
 5   Revenue generated        100 non-null    float64
 6   Customer demographics    100 non-null    object 
 7   Stock levels             100 non-null    int64  
 8   Lead times               100 non-null    int64  
 9   Order quantities         100 non-null    int64  
 10  Shipping times           100 non-null    int64  
 11  Shipping carriers        100 non-null    object 
 12  Shipping costs           100 non-null    float64
 13  Supplier name            100 non-null    object 
 14  Location                 100 non-null    object 
 15  Lead time                100 non-null    int64  
 16  Production volumes       100 non-null    int64  
 17  Manufacturing lead time  100 non-null    int64  
 18  Manufacturing costs      100 non-null    float64
 19  Inspection results       100 non-null    object 
 20  Defect rates             100 non-null    float64
 21  Transportation modes     100 non-null    object 
 22  Routes                   100 non-null    object 
 23  Costs                    100 non-null    float64
dtypes: float64(6), int64(9), object(9)
memory usage: 18.9+ KB
None
Product type                object
SKU                         object
Price                      float64
Availability                 int64
Number of products sold      int64
Revenue generated          float64
Customer demographics       object
Stock levels                 int64
Lead times                   int64
Order quantities             int64
Shipping times               int64
Shipping carriers           object
Shipping costs             float64
Supplier name               object
Location                    object
Lead time                    int64
Production volumes           int64
Manufacturing lead time      int64
Manufacturing costs        float64
Inspection results          object
Defect rates               float64
Transportation modes        object
Routes                      object
Costs                      float64
dtype: object
In [6]:
# Getting to know how our data looks in statistical
data.describe()
Out[6]:
Price Availability Number of products sold Revenue generated Stock levels Lead times Order quantities Shipping times Shipping costs Lead time Production volumes Manufacturing lead time Manufacturing costs Defect rates Costs
count 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.00000 100.000000 100.000000 100.000000
mean 49.462461 48.400000 460.990000 5776.048187 47.770000 15.960000 49.220000 5.750000 5.548149 17.080000 567.840000 14.77000 47.266693 2.277158 529.245782
std 31.168193 30.743317 303.780074 2732.841744 31.369372 8.785801 26.784429 2.724283 2.651376 8.846251 263.046861 8.91243 28.982841 1.461366 258.301696
min 1.699976 1.000000 8.000000 1061.618523 0.000000 1.000000 1.000000 1.000000 1.013487 1.000000 104.000000 1.00000 1.085069 0.018608 103.916248
25% 19.597823 22.750000 184.250000 2812.847151 16.750000 8.000000 26.000000 3.750000 3.540248 10.000000 352.000000 7.00000 22.983299 1.009650 318.778455
50% 51.239831 43.500000 392.500000 6006.352023 47.500000 17.000000 52.000000 6.000000 5.320534 18.000000 568.500000 14.00000 45.905622 2.141863 520.430444
75% 77.198228 75.000000 704.250000 8253.976921 73.000000 24.000000 71.250000 8.000000 7.601695 25.000000 797.000000 23.00000 68.621026 3.563995 763.078231
max 99.171329 100.000000 996.000000 9866.465458 100.000000 30.000000 96.000000 10.000000 9.929816 30.000000 985.000000 30.00000 99.466109 4.939255 997.413450

Data Cleaning¶

In order to ensuring data cleanliness, we're conducting these steps to find if there's duplicate or any anomali in our data that we didn't capture in previous step.

In [7]:
# Review for duplicate and missing data
duplicate_rows = data[data.duplicated()]
missing_values = data.isnull().sum()

# Showing the result
print(duplicate_rows)
print(missing_values)
Empty DataFrame
Columns: [Product type, SKU, Price, Availability, Number of products sold, Revenue generated, Customer demographics, Stock levels, Lead times, Order quantities, Shipping times, Shipping carriers, Shipping costs, Supplier name, Location, Lead time, Production volumes, Manufacturing lead time, Manufacturing costs, Inspection results, Defect rates, Transportation modes, Routes, Costs]
Index: []

[0 rows x 24 columns]
Product type               0
SKU                        0
Price                      0
Availability               0
Number of products sold    0
Revenue generated          0
Customer demographics      0
Stock levels               0
Lead times                 0
Order quantities           0
Shipping times             0
Shipping carriers          0
Shipping costs             0
Supplier name              0
Location                   0
Lead time                  0
Production volumes         0
Manufacturing lead time    0
Manufacturing costs        0
Inspection results         0
Defect rates               0
Transportation modes       0
Routes                     0
Costs                      0
dtype: int64

Conduct Descriptive Analysis¶

In this process we're trying to answer these question to better understand how our data has done. The process will execute using some modules from pandas.

In [8]:
# What products are most popular?
popular = data.groupby('Product type')['Number of products sold'].sum().sort_values(ascending=False)
print('The most popular products are:')
print(popular)

# What are the most profitable products?
profitable = data.groupby('Product type')['Revenue generated'].sum().sort_values(ascending=False)
print('The most profitable products are:')
print(profitable)

# What are our stock levels?
stock_levels = data.groupby('Product type')['Stock levels'].sum()
print('Our stock level are:')
print(stock_levels)
The most popular products are:
Product type
skincare     20731
haircare     13611
cosmetics    11757
Name: Number of products sold, dtype: int64
The most profitable products are:
Product type
skincare     241628.162133
haircare     174455.390605
cosmetics    161521.265999
Name: Revenue generated, dtype: float64
Our stock level are:
Product type
cosmetics    1525
haircare     1644
skincare     1608
Name: Stock levels, dtype: int64
In [9]:
# What are our lead times?
lead_times = data.groupby('Product type')['Lead times'].mean()
print('Our lead times are:')
print(lead_times)

# How much are we spending o shipping?
shipping_costs = data['Shipping costs'].sum()
print('We are spending ${} on shipping.'.format(shipping_costs))

# What are our supplier costs?
supplier_costs = data.groupby('Supplier name')['Costs'].sum()
print("\nSupplier Costs:")
print(supplier_costs)
Our lead times are:
Product type
cosmetics    15.384615
haircare     15.529412
skincare     16.700000
Name: Lead times, dtype: float64
We are spending $554.8149072019588 on shipping.

Supplier Costs:
Supplier name
Supplier 1    15520.980745
Supplier 2    11330.599298
Supplier 3     7032.001523
Supplier 4     9392.587517
Supplier 5     9648.409132
Name: Costs, dtype: float64
In [10]:
# What are our manufacturing costs?
manufacturing_costs = data["Manufacturing costs"].sum()
print("We are spending ${} on manufacturing.".format(manufacturing_costs))

# What are our defect rates?
defect_rates = data["Defect rates"].mean()
print("Our defect rate is {}%.".format(defect_rates * 100))

# How can we improve our transportation?
transportation_modes = data['Transportation modes'].unique()
transportation_strategies = [
    "Using more efficient shipping carriers.",
    "Negotiating better shipping rates with our carriers.",
    "Consolidating our shipments to reduce shipping costs.",
    "Using more sustainable transportation methods."
]
sorted_transportation_strategies = sorted(transportation_strategies)
sorted_optimal_transportation_strategy = "\n".join(sorted_transportation_strategies)
print("Recommended Transportation Strategy:\n", sorted_optimal_transportation_strategy)
We are spending $4726.669324146992 on manufacturing.
Our defect rate is 227.71579927396098%.
Recommended Transportation Strategy:
 Consolidating our shipments to reduce shipping costs.
Negotiating better shipping rates with our carriers.
Using more efficient shipping carriers.
Using more sustainable transportation methods.

After we gain insight by answering a number of questions regarding each of the information provided by the data, then we need to do a descriptive analysis to answer some follow-up questions to find out the relationship of each variable that we hope can get deeper insights from this data.

In [11]:
# Here to analyze the relationship between the product's price and the revenue generated
fig = px.scatter(data, x='Price',
                y='Revenue generated',
                color='Product type',
                hover_data=['Number of products sold'],
                trendline='ols')
fig.show()

The visualization shows that companies earn more revenue from skincare product lines, with a linear detail where the higher the price of skincare the higher the revenue they can get.

Next, we visualize the sales of product.

In [12]:
# Plotting into pie chart
sales_data = data.groupby('Product type')['Number of products sold'].sum().reset_index()
pie_chart = px.pie(sales_data, values='Number of products sold',
                  names='Product type',
                  title='Sales by Product Type',
                  hover_data=['Number of products sold'],
                  hole=0.5,
                  color_discrete_sequence=px.colors.qualitative.Pastel)

pie_chart.update_traces(textposition='inside',
                       textinfo='percent+label')
pie_chart.show()

45% of total sales came from skin care, 29.5% from hair care, and 25.5% from cosmetics.

Next, we will see how the distribution of revenue generated by shipping carriers.

In [13]:
total_revenue = data.groupby('Shipping carriers')['Revenue generated'].sum().reset_index()

fig = go.Figure()
fig.add_trace(go.Bar(x=total_revenue['Shipping carriers'],
                    y=total_revenue['Revenue generated']))
fig.update_layout(title='Total Revenue by Shipping Carrie',
                 xaxis_title= 'Shipping Carrier',
                 yaxis_title='Revenue Generated')
fig.show()

After observed from the data above, from three available carriers, carrier B is a shipping carrier that helps companies earn more revenue.

Now let's have a work to get to know the Average lead time and Average Manufacturing Costs for all products of the company.

In [14]:
# Here to see how is the average lead time and mfg. costs look like
lead_time_avg = data.groupby('Product type')['Lead time'].mean().reset_index()
mfg_costs_avg = data.groupby('Product type')['Manufacturing costs'].mean().reset_index()

# Define and show the result
result = pd.merge(lead_time_avg, mfg_costs_avg, on='Product type')
result.rename(columns={'Lead time': 'Average Lead Time', 'Manufacturing Costs': 'Average Manufacturing Costs'}, inplace=True)
print(result)
  Product type  Average Lead Time  Manufacturing costs
0    cosmetics          13.538462            43.052740
1     haircare          18.705882            48.457993
2     skincare          18.000000            48.993157

Let's continue to analyze the distribution of each SKU in terms of revenue generated, stock level and order quantity.

In [15]:
# Number of SKU x revenue generated
revenue_chart = px.line(data, x='SKU', y='Revenue generated', title='Revenue generated by SKU')
revenue_chart.show()
In [16]:
# Number of SKU x stock levels
stock_chart = px.line(data, x='SKU', y='Stock levels', title='Stock levels by SKU')
stock_chart.show()
In [17]:
# Number of SKU x order quantity
order_quantity_chart = px.bar(data, x='SKU', y='Order quantities', title='Order Quantity by SKU')
order_quantity_chart.show()

As part of analyzing cost, we want to see the cost distribution in terms of shipping carriers and transportation mode.

In [18]:
# Distrubution shipping carriers x shipping costs
shipping_cost_chart = px.bar(data, x='Shipping carriers',
                            y='Shipping costs',
                            title='Distribution of Shipping Costs by Carriers')
shipping_cost_chart.show()

From the previous process by analyze the shipping carriers revenue generated, we've seen that carrier B was the highest top choice and helped company to get more revenue, but also as shown on the above visualization, carrier B also the most costly carrier among the three choices.

How about the cost distribution by transportation mode?

In [19]:
transport_chart = px.pie(data, values='Costs',
                        names='Transportation modes',
                        title='Cost Distribution by Transportation Modes',
                        hole=0.5,
                        color_discrete_sequence=px.colors.qualitative.Pastel)
transport_chart.show()

The company spends more on land routes, with road and rail modes. The distribution shows that 30.3% of costs are incurred by road mode, and 28.7% by rail mode.

Next we will analyze the level of defects. This is important to analyze, because in every shipment of goods or products, sometimes we found the defect one. We will analyze the average defect rate for each product, and which mode of transportation tends to occur frequently.

In [20]:
defect_rates_by_product = data.groupby('Product type')['Defect rates'].mean().reset_index()

# Ploting into bar chart
fig = px.bar(defect_rates_by_product,
            x='Product type',
            y='Defect rates',
            title='Average Defect Rates by Product Type')
fig.show()
In [21]:
# Defect rates bt transportation mode
pivot_table = pd.pivot_table(data, values='Defect rates',
                            index=['Transportation modes'],
                            aggfunc='mean')

# Plotting into pie chart
transportation_chart = px.pie(values=pivot_table['Defect rates'],
                             names=pivot_table.index,
                             title='Defect Rates by Transportation Modes',
                             hole=0.5,
                             color_discrete_sequence=px.colors.qualitative.Pastel)
transportation_chart.show()

From the visualization shown above, the defect rate that occurs on average for haircare product lines has the highest level. Meanwhile, the land route, or more precisely the road, is the mode of transportation that tends to have the highest rate of defect.

Key Findings¶

Following are some important findings after carrying out the entire analysis process:

  1. The most popular product line is Skincare, followed by Haircare product line.
  2. The 2 most popular product lines are also followed as the most profitable products.
  3. Total spending on shipping is USD554,814,907.
  4. Supplier 1 becomes the supplier with the highest cost.
  5. Total spending for manufacturing financing is USD4726.669324
  6. Companies get more revenue from skincare product lines. This is also reinforced by the percentage of total sales of 45%.
  7. Out of the three Carriers available, Carrier B is a delivery operator that helps companies earn more revenue.
  8. SKU51 became the SKU that helped the company earn the most revenue, amounting to USD9.86647K. SKU 12, SKU51 and SKU59 both have a stock level of 100.
  9. SKU0 and SKU20 have the highest order quantity, which is 96.
  10. Apart from being a delivery service provider that helps to earn high income, Carrier B is also the carrier with the highest fees.
  11. Companies spending more on land routes.
  12. The pattern also shows that haircare product lines tend to have the highest average defect rate. In addition, because companies tend to use land transportation modes, the choice of this mode shows that defective products are often found.

Summary & Recommendation¶

Earlier this project we already define some questions to be answer through this whole project. So, Here's to summarize it all by answering those questions and provide some recommendation regarding this topic:

  1. Which products should we focus on?

Answer: Companies can focus more on maintaining and developing products from lines that are most popular in the eyes of consumers and those that have high profits, the answer is the Skincare product line.

  1. Which suppliers should we work with?

Answer: Data shows Suppliers 1 and 2 are the top 2 suppliers with the highest costs, it would be better to consider other suppliers (supplier 3 for example) which have more affordable costs, apart from being able to incur lower costs of course the company can make savings.

  1. How can we improve our manufacturing process?

Answer: By looking at the available data, there are several opportunities that companies can use to improve their manufacturing processes. Such as optimizing processes by using more cost-effective materials. Work with suppliers to find ways to improve ingredient quality, reduce shipping costs, or increase delivery times. Consolidate with distributors in the distribution of goods to arrive on time and reduce the level of defects in goods.

  1. How can we reduce our shipping costs?

Answer: Several ways can be done to reduce shipping costs, we can do that by using multiple carriers, consider in using consolidated shipping, mind use dimensional weight pricing, ship during off-peak hours, ensure that the goods are handled and packaged securely to avoid defects caused by the shipping process, and last but not least use free shipping if available.

  1. How can we improve our customer service?

Answer: Ways that can be done to improve service to customers are investing in training, this can be train the people (employee) or other stakeholders, to make sure they trained well to produce excellent product. Gathering feedback from customers, by doing this one, can help company identify areas where they can improve the customer service. Monitoring customer service metrics, by tracking customer satisfaction scores and complaint rates which this will help to measure the effectiveness of the customer service efforts. If needed, use technology as an advantage, this can be done by developing new technology that aimed to direct connection and respons to the customer such as, chatbots, live chat and social media monitoring.

In [ ]: